Title: "Final Analysis Report on Green Vancouver: A trip across different Vancouver neighbourhoods to explore various tree species using Altair." Author: Suvin Majithia Date: April 8, 2024

Introduction:

1.) Clearly state the question or questions that you want to answer from your analysis:

The following questions of interest are aimed to be answered through this analysis:

Question 1.) Exploring different Vancouver neighbourhoods and what do they tell me? Question 2.) Total count of different species of trees planted in Vancouver. Question 3.) Distribution of different species of trees in Vancouver accoring to their latitude and longitude. Question 4.) Street_side_name vs species/genus of trees - different types of trees based on even or odd street side name.

Here I will explore the subset of the original dataset called "Vancouver Street Trees". The subset is called "small_unique_vancouver.csv" and has 5000 rows and 21 columns. The dataset used here is subset of the original data from the City of Vancouver website. The data were obtained from The city of Vancouver's Open Data Portal and follows an Open Government Licence – Vancouver. I am using a powerful visualization library called Altair to create plots to find answers for the above mentioned questions in this analysis.

Vancouver is a beautiful bustling city located on the western coast of Canada in the province of British Columbia, with nature decorating its each corner making it one of the most beautiful Canadian cities. I have been in Vancouver for close to 6 years now and the beauty bestowed upon this city is unparalleled in my opinion.

I love to hike and observe nature and this made me curious about the questions I aim to ask and find answers for in this analysis. I believe that the dataset used here has various tree species in different Vancouver neighbourhoods and I believe some of these tree species might possess several medicinal properties, characteristics native only that particular tree species. Therefore, with mountains and ample greenary in Vancouver, let me take you on a trip to explore different Vancouver neighbourhoods and the different tree species that exist there.

Why am I asking these specific questions only: 1.) Different Vancouver neighbourhood having different species of trees can provide insights into distribution and diversity of various tree species across the city and also tell us more about biodiversity of Vancouver's neighbourhoods. 2.) Total count of different species of trees planted tells us more about most common and rare tree species highlighting comparisions between neighbourhoods. 3.) Based on latitude and longitude co-ordinates, we can know about the spatial distribution of various tree species across Vancouver, I find this information useful for urban planners, ecologists, people involved in environmental habibat preservation etc. 4.) Street_side_name vs species/genus of trees lets me know about any specific climate factors taken into consideration like wind patterns, sunlight exposure etc. while planting the trees on odd or even street side.

Analysis: Import the data: I have imported the necessary libraries for my analysis in this report like pandas, numpy-for data wrangling and altair-for data visualization.

Summarize (describe) the data using appropriate tables and methods: The Vancouver Trees dataset is a subset of the original dataset and is made up of one table: small_unique_vancouver.csv and is stored in a .csv file. It contains different information about the tree species in different Vancouver neighbourhoods including tree species name, tree genus name, neighbourhood name, street side name, common name of trees, cultivar name etc. Table below summrizes the subset of the original dataset used here.

Explain the dataset: The dataset here is subset of the original dataset called "Vancouver Street Trees". The subset is called "small_unique_vancouver.csv" and has 5000 rows and 21 columns. The dataset used here is subset of the original data from the City of Vancouver website. The data were obtained from The city of Vancouver's Open Data Portal and follows an Open Government Licence – Vancouver.

Explain the columns of interest and how they will contribute to answering your question: The columns of interest for this report are species_name, neighbourhood_name, street_side_name, genus_name, latitude and longitude etc. The columns species_name and neighbourhood_name are used to answer the question 1, distribution of diffrent tree species in different Vancouver neighbourhoods, columns: street_side_name will be used to find the total count of different tree species based on even or odd street side, columns: latitude and longitude will be used to answer the distribution of the trees according to latitude and longitude geographical co-ordinates. genus_name and street_side_name columns will let me know the distribution of tree genus according to street side name.

Are there any null values that need to be examined more closely: The columns of interest for this report are: species_name, neighbourhood_name, street_side_name, genus_name, latitude and longitude and have no null values, as I observed the null values present in the different columns of the dataset using isnull().sum() and found no null values that need to be examined.

Answer your question(s) by creating at least 4 visualizations that will be used to communicate your findings to the reader, with some questions building from and being brought to light by previous visualizations.

Source of the data and its description: The dataset used here is the subset of the original data from the City of Vancouver website. The subset is called "small_unique_vancouver.csv" and has 5000 rows and 21 columns. The dataset used here is obtained from the following url: https://raw.githubusercontent.com/UBC-MDS/exploratory-data-viz/main/data/vancouver_trees.csv The columns of interest for this report include: neighbourhood_name, street_side_name, latitude, longitude, species_name etc. to answer our topmost questions: Question 1.) Exploring different Vancouver neighbourhoods and what do they tell me? Question 2.) Total count of different species of trees planted in Vancouver. Question 3.) Distribution of different species of trees in Vancouver accoring to their latitude and longitude. Question 4.) Street_side_name vs species/genus of trees - different types of trees based on even or odd street side name.

Data preprocessing: I have taken these columns into consideration for my analysis in this report as mentioned: neighbourhood_name, street_side_name, latitude, longitude, species_name etc to answer the questions of interest. I have tried to find which columns in the dataset have null values and do I need to fill them or what insights can be generated from them. I have also not dropped any columns but the columns I aim or have mentioned that I will use in my analysis need no data cleaning or preprocessing in my observation here.

Visualization 1 for this question: 1.) Which different Vancouver neighbourhood has which specific species of tree and what can be their specific properties only native to that particular tree species.

Visualization 2 for the question: 2.) Total count of different species of trees planted in Vancouver.

Visualization 3 for the question: 3.) Distribution of different species of trees in Vancouver according to their latitude and longitude.

Visualization 4 for the question: 4.) Street_side_name vs species/genus of trees - different types of trees based on even or odd street side name.

Discussion: Summarize and report your insights described in your analysis: I have used 4 types of different graphs for 4 questions of interest I aim to answer in this report. For question 1.) Which different Vancouver neighbourhood has which specific species of tree and what can be their specific properties only native to that particular tree species. I have used a faceted bar plot to create facets or subplots based on different Vancouver neighbourhoods and tree species found there. I found that in the neighbourhoods such as Victoria-Fraserview, Sunset, Mount Plesant, West Point Grey etc. you will be able to find mostly all the tree species and then there is lesser distribution of tree species in neighbourhoods such as Kitsilano, Dunbar-Southlands etc. The graph plotted for question 1 is called 'plot_one' and has longitude on x axis and latitude on y axis and has a corresponding legend highlighting different tree species using colors.

For question 2.) Total count of different species of trees planted in Vancouver. I have used a heatmap plot with total count on y axis and tree species name on x axis. The graph plotted is called 'plot_two' and has longitude on x axis and latitude on y axis and has a corresponding legend highlighting total count using colors. The important findings include the tree species like SERRULATA, PLATANOIDES etc. are higher in number across Vancouver.

For question 3.) Distribution of different species of trees in Vancouver according to their latitude and longitude. I have used a scatter plot in the form of map using longitude and latitude to map the tree species based on those co-ordinates.
The graph plotted for question 3 is called 'plot_three' and has longitude on x axis and latitude on y axis and has a corresponding legend highlighting different tree species using colors.

For question 4.) Street_side_name vs species/genus of trees - different types of trees based on even or odd street side name. I have used a line graph to map count of tree genus vs odd or even street side where they will be planted, and has a corresponding legend. The graph plotted for question 4 is called 'plot_four' and has street_side_name on x axis and count on y axis.

Write concluding remarks. Discuss whether this is what you expected: The concluding remarks for this report match my expectations and based on the plots vaious observations and insights were found, which have beeen discussed and mentioned in the above points.

Discuss other ideas or questions you might have or want to answer next: Other ideas or questions I wish to answer next include the following: 1.) Other observations that we can make based on distribution of tree species in Vancouve neighbourhoods should be why was a particular tree species planted in that particular Vancouver neighbourhood and what other factors influenced it. 2.) Any historical, cultural or religious significance associated with a particular tree species. 3.) Which factors that were taken into consideration to plant those particular tree species on odd or even side street, according to latitude and longitude coordinates. 4.) If any tree species has any medicinal or herbal properties only native to that tree species.

I think that these questions that I would like to later focus on would significantly reveal more insightful patterns about tree species in Vancouver neighbourhoods and help urban planners, ecologists and botanists etc. a lot with their research and planning.

Dashboard: (two options: you can build a dashboard with either 4 or 2 plots, both choices can get you full marks) Combine the visualizations that you made in your report into an interactive panel and add some interactive components (choose all 4 or just 2 plots) Have at least 2 widgets (Example: drop-down menu, slider, check box, clickable legend) and one plot that acts as a selection tool (Example: clicking on one plot will change the data displayed in another plot). (this requirement is the same for 4 or 2 plots) For 4 Plots: Have at least 3 interactive plots in your combined panel. For 2: have one plot that changes based on a selection in the other plot, and both plots interactive with a selection widget) All plots should have clear titles/labels/selection tools.

References:

https://opendata.vancouver.ca/explore/dataset/street-trees/information/?disjunctive.species_name&disjunctive.common_name&disjunctive.height_range_id&disjunctive.on_street&disjunctive.neighbourhood_name https://raw.githubusercontent.com/UBC-MDS/data_viz_wrangled/main/data/Trees_data_sets/small_unique_vancouver.csv https://stackoverflow.com/questions/62602367/altair-use-a-field-to-specify-the-domain-of-the-y-axis